AITopics | attack loss

Collaborating Authors

attack loss

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

4cddc8fc57039f8fe44e23aba1e4df40-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 13:56:45 GMT

attack strategy, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Yolo County > Davis (0.14)

Industry:

Information Technology > Security & Privacy (0.52)
Leisure & Entertainment > Games (0.45)
Energy > Energy Storage (0.45)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak

Wang, Hao, Li, Hao, Zhu, Junda, Wang, Xinyuan, Pan, Chengwei, Huang, MinLie, Sha, Lei

arXiv.org Artificial IntelligenceJan-4-2025

Large Language Models (LLMs) are susceptible to generating harmful content when prompted with carefully crafted inputs, a vulnerability known as LLM jailbreaking. As LLMs become more powerful, studying jailbreak methods is critical to enhancing security and aligning models with human values. Traditionally, jailbreak techniques have relied on suffix addition or prompt templates, but these methods suffer from limited attack diversity. This paper introduces DiffusionAttacker, an end-to-end generative approach for jailbreak rewriting inspired by diffusion models. Our method employs a sequence-to-sequence (seq2seq) text diffusion model as a generator, conditioning on the original prompt and guiding the denoising process with a novel attack loss. Unlike previous approaches that use autoregressive LLMs to generate jailbreak prompts, which limit the modification of already generated tokens and restrict the rewriting space, DiffusionAttacker utilizes a seq2seq diffusion model, allowing more flexible token modifications. This approach preserves the semantic content of the original prompt while producing harmful content. Additionally, we leverage the Gumbel-Softmax technique to make the sampling process from the diffusion model's output distribution differentiable, eliminating the need for iterative token search. Extensive experiments on Advbench and Harmbench demonstrate that DiffusionAttacker outperforms previous methods across various evaluation metrics, including attack success rate (ASR), fluency, and diversity.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.17522

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Cost Aware Untargeted Poisoning Attack against Graph Neural Networks,

Han, Yuwei, Lai, Yuni, Zhu, Yulin, Zhou, Kai

arXiv.org Artificial IntelligenceDec-12-2023

Graph Neural Networks (GNNs) have become widely used in the field of graph mining. However, these networks are vulnerable to structural perturbations. While many research efforts have focused on analyzing vulnerability through poisoning attacks, we have identified an inefficiency in current attack losses. These losses steer the attack strategy towards modifying edges targeting misclassified nodes or resilient nodes, resulting in a waste of structural adversarial perturbation. To address this issue, we propose a novel attack loss framework called the Cost Aware Poisoning Attack (CA-attack) to improve the allocation of the attack budget by dynamically considering the classification margins of nodes. Specifically, it prioritizes nodes with smaller positive margins while postponing nodes with negative margins. Our experiments demonstrate that the proposed CA-attack significantly enhances existing attack strategies

attack loss, neural network, node, (12 more...)

arXiv.org Artificial Intelligence

2312.07158

Country:

North America > United States (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)

Add feedback

Simple and Efficient Partial Graph Adversarial Attack: A New Perspective

Zhu, Guanghui, Chen, Mengyu, Yuan, Chunfeng, Huang, Yihua

arXiv.org Artificial IntelligenceAug-15-2023

As the study of graph neural networks becomes more intensive and comprehensive, their robustness and security have received great research interest. The existing global attack methods treat all nodes in the graph as their attack targets. Although existing methods have achieved excellent results, there is still considerable space for improvement. The key problem is that the current approaches rigidly follow the definition of global attacks. They ignore an important issue, i.e., different nodes have different robustness and are not equally resilient to attacks. From a global attacker's view, we should arrange the attack budget wisely, rather than wasting them on highly robust nodes. To this end, we propose a totally new method named partial graph attack (PGA), which selects the vulnerable nodes as attack targets. First, to select the vulnerable items, we propose a hierarchical target selection policy, which allows attackers to only focus on easy-to-attack nodes. Then, we propose a cost-effective anchor-picking policy to pick the most promising anchors for adding or removing edges, and a more aggressive iterative greedy-based attack method to perform more efficient attacks. Extensive experimental results demonstrate that PGA can achieve significant improvements in both attack effect and attack efficiency compared to other existing graph global attack methods.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Artificial Intelligence

2308.07834

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Efficient Adversarial Attacks on Online Multi-agent Reinforcement Learning

Liu, Guanlin, Lai, Lifeng

arXiv.org Artificial IntelligenceJul-14-2023

Due to the broad range of applications of multi-agent reinforcement learning (MARL), understanding the effects of adversarial attacks against MARL model is essential for the safe applications of this model. Motivated by this, we investigate the impact of adversarial attacks on MARL. In the considered setup, there is an exogenous attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them. The attacker aims to guide each agent into a target policy or maximize the cumulative rewards under some specific reward function chosen by the attacker, while minimizing the amount of manipulation on feedback and action. We first show the limitations of the action poisoning only attacks and the reward poisoning only attacks. We then introduce a mixed attack strategy with both the action poisoning and the reward poisoning. We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.

attack strategy, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2307.0767

Country: North America > United States > California > Yolo County > Davis (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Certified Radius-Guided Attack Framework to Image Segmentation Models

Qu, Wenjie, Li, Youqi, Wang, Binghui

arXiv.org Artificial IntelligenceApr-5-2023

Image segmentation is an important problem in many safety-critical applications. Recent studies show that modern image segmentation models are vulnerable to adversarial perturbations, while existing attack methods mainly follow the idea of attacking image classification models. We argue that image segmentation and classification have inherent differences, and design an attack framework specially for image segmentation models. Our attack framework is inspired by certified radius, which was originally used by defenders to defend against adversarial perturbations to classification models. We are the first, from the attacker perspective, to leverage the properties of certified radius and propose a certified radius guided attack framework against image segmentation models. Specifically, we first adapt randomized smoothing, the state-of-the-art certification method for classification models, to derive the pixel's certified radius. We then focus more on disrupting pixels with relatively smaller certified radii and design a pixel-wise certified radius guided loss, when plugged into any existing white-box attack, yields our certified radius-guided white-box attack. Next, we propose the first black-box attack to image segmentation models via bandit. We design a novel gradient estimator, based on bandit feedback, which is query-efficient and provably unbiased and stable. We use this gradient estimator to design a projected bandit gradient descent (PBGD) attack, as well as a certified radius-guided PBGD (CR-PBGD) attack. We prove our PBGD and CR-PBGD attacks can achieve asymptotically optimal attack performance with an optimal rate. We evaluate our certified-radius guided white-box and black-box attacks on multiple modern image segmentation models and datasets. Our results validate the effectiveness of our certified radius-guided attack framework.

artificial intelligence, machine learning, perturbation, (16 more...)

arXiv.org Artificial Intelligence

2304.02693

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (0.93)
Information Technology > Security & Privacy (0.68)
Banking & Finance (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

What Does the Gradient Tell When Attacking the Graph Structure

Liu, Zihan, Wang, Ge, Luo, Yun, Li, Stan Z.

arXiv.org Artificial IntelligenceMar-29-2023

Recent research has revealed that Graph Neural Networks (GNNs) are susceptible to adversarial attacks targeting the graph structure. A malicious attacker can manipulate a limited number of edges, given the training labels, to impair the victim model's performance. Previous empirical studies indicate that gradient-based attackers tend to add edges rather than remove them. In this paper, we present a theoretical demonstration revealing that attackers tend to increase inter-class edges due to the message passing mechanism of GNNs, which explains some previous empirical observations. By connecting dissimilar nodes, attackers can more effectively corrupt node features, making such attacks more advantageous. However, we demonstrate that the inherent smoothness of GNN's message passing tends to blur node dissimilarity in the feature space, leading to the loss of crucial information during the forward process. To address this issue, we propose a novel surrogate model with multi-level propagation that preserves the node dissimilarity information. This model parallelizes the propagation of unaggregated raw features and multi-hop aggregated features, while introducing batch normalization to enhance the dissimilarity in node representations and counteract the smoothness resulting from topological aggregation. Our experiments show significant improvement with our approach.Furthermore, both theoretical and experimental evidence suggest that adding inter-class edges constitutes an easily observable attack pattern. We propose an innovative attack loss that balances attack effectiveness and imperceptibility, sacrificing some attack effectiveness to attain greater imperceptibility. We also provide experiments to validate the compromise performance achieved through this attack loss.

artificial intelligence, attacker, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2208.12815

Country:

Europe > France (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Learning When to Use Adaptive Adversarial Image Perturbations against Autonomous Vehicles

Yoon, Hyung-Jin, Jafarnejadsani, Hamidreza, Voulgaris, Petros

arXiv.org Artificial IntelligenceMar-15-2023

The deep neural network (DNN) models for object detection using camera images are widely adopted in autonomous vehicles. However, DNN models are shown to be susceptible to adversarial image perturbations. In the existing methods of generating the adversarial image perturbations, optimizations take each incoming image frame as the decision variable to generate an image perturbation. Therefore, given a new image, the typically computationally-expensive optimization needs to start over as there is no learning between the independent optimizations. Very few approaches have been developed for attacking online image streams while considering the underlying physical dynamics of autonomous vehicles, their mission, and the environment. We propose a multi-level stochastic optimization framework that monitors an attacker's capability of generating the adversarial perturbations. Based on this capability level, a binary decision attack/not attack is introduced to enhance the effectiveness of the attacker. We evaluate our proposed multi-level image attack framework using simulations for vision-guided autonomous vehicles and actual tests with a small indoor drone in an office environment. The results show our method's capability to generate the image attack in real-time while monitoring when the attacker is proficient given state estimates.

artificial intelligence, machine learning, perturbation, (19 more...)

arXiv.org Artificial Intelligence

2212.13667

Country:

North America > United States > Nevada > Washoe County > Reno (0.14)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

ESTAS: Effective and Stable Trojan Attacks in Self-supervised Encoders with One Target Unlabelled Sample

Xue, Jiaqi, Lou, Qian

arXiv.org Artificial IntelligenceJan-24-2023

Emerging self-supervised learning (SSL) has become a popular image representation encoding method to obviate the reliance on labeled data and learn rich representations from large-scale, ubiquitous unlabelled data. Then one can train a downstream classifier on top of the pre-trained SSL image encoder with few or no labeled downstream data. Although extensive works show that SSL has achieved remarkable and competitive performance on different downstream tasks, its security concerns, e.g, Trojan attacks in SSL encoders, are still not well-studied. In this work, we present a novel Trojan Attack method, denoted by ESTAS, that can enable an effective and stable attack in SSL encoders with only one target unlabeled sample. In particular, we propose consistent trigger poisoning and cascade optimization in ESTAS to improve attack efficacy and model accuracy, and eliminate the expensive target-class data sample extraction from large-scale disordered unlabelled data. Our substantial experiments on multiple datasets show that ESTAS stably achieves > 99% attacks success rate (ASR) with one target-class sample. Compared to prior works, ESTAS attains > 30% ASR increase and > 8.3% accuracy improvement on average.

artificial intelligence, encoder, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2211.10908

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.89)
Education (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback